MozartV1, Main, Exploration, bibRecord, 001860

An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features

Identifieur interne : 001860 ( Main/Exploration ); précédent : 001859; suivant : 001861

An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features

Auteurs : Ted Pedersen [États-Unis] ; Anagha Kulkarni [États-Unis] ; Roxana Angheluta [Belgique] ; Zornitsa Kozareva [Espagne] ; Thamar Solorio [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.

RBID : ISTEX:CA0D47ABADFF95734272C07D75D52761282B3098

Abstract

Abstract: Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.

Url:

https://api.istex.fr/document/CA0D47ABADFF95734272C07D75D52761282B3098/fulltext/pdf

DOI: 10.1007/11671299_23

Affiliations:

Belgique, Espagne, États-Unis

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 002404
to stream Istex, to step Curation: 001F63
to stream Istex, to step Checkpoint: 001217
to stream Main, to step Merge: 001881
to stream Main, to step Curation: 001860

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features</title>
<author><name sortKey="Pedersen, Ted" sort="Pedersen, Ted" uniqKey="Pedersen T" first="Ted" last="Pedersen">Ted Pedersen</name>
</author>
<author><name sortKey="Kulkarni, Anagha" sort="Kulkarni, Anagha" uniqKey="Kulkarni A" first="Anagha" last="Kulkarni">Anagha Kulkarni</name>
</author>
<author><name sortKey="Angheluta, Roxana" sort="Angheluta, Roxana" uniqKey="Angheluta R" first="Roxana" last="Angheluta">Roxana Angheluta</name>
</author>
<author><name sortKey="Kozareva, Zornitsa" sort="Kozareva, Zornitsa" uniqKey="Kozareva Z" first="Zornitsa" last="Kozareva">Zornitsa Kozareva</name>
</author>
<author><name sortKey="Solorio, Thamar" sort="Solorio, Thamar" uniqKey="Solorio T" first="Thamar" last="Solorio">Thamar Solorio</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:CA0D47ABADFF95734272C07D75D52761282B3098</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11671299_23</idno>
<idno type="url">https://api.istex.fr/document/CA0D47ABADFF95734272C07D75D52761282B3098/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002404</idno>
<idno type="wicri:Area/Istex/Curation">001F63</idno>
<idno type="wicri:Area/Istex/Checkpoint">001217</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Pedersen T:an:unsupervised:language</idno>
<idno type="wicri:Area/Main/Merge">001881</idno>
<idno type="wicri:Area/Main/Curation">001860</idno>
<idno type="wicri:Area/Main/Exploration">001860</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features</title>
<author><name sortKey="Pedersen, Ted" sort="Pedersen, Ted" uniqKey="Pedersen T" first="Ted" last="Pedersen">Ted Pedersen</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Minnesota, Duluth</wicri:regionArea>
<wicri:noRegion>Duluth</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kulkarni, Anagha" sort="Kulkarni, Anagha" uniqKey="Kulkarni A" first="Anagha" last="Kulkarni">Anagha Kulkarni</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Minnesota, Duluth</wicri:regionArea>
<wicri:noRegion>Duluth</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Angheluta, Roxana" sort="Angheluta, Roxana" uniqKey="Angheluta R" first="Roxana" last="Angheluta">Roxana Angheluta</name>
<affiliation wicri:level="1"><country xml:lang="fr">Belgique</country>
<wicri:regionArea>Katholieke Universiteit Leuven</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Kozareva, Zornitsa" sort="Kozareva, Zornitsa" uniqKey="Kozareva Z" first="Zornitsa" last="Kozareva">Zornitsa Kozareva</name>
<affiliation wicri:level="1"><country xml:lang="fr">Espagne</country>
<wicri:regionArea>University of Alicante</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Solorio, Thamar" sort="Solorio, Thamar" uniqKey="Solorio T" first="Thamar" last="Solorio">Thamar Solorio</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Texas at El Paso</wicri:regionArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
</series>
<idno type="istex">CA0D47ABADFF95734272C07D75D52761282B3098</idno>
<idno type="DOI">10.1007/11671299_23</idno>
<idno type="ChapterID">Chap23</idno>
<idno type="ChapterID">23</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.</div>
</front>
</TEI>
<affiliations><list><country><li>Belgique</li>
<li>Espagne</li>
<li>États-Unis</li>
</country>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Pedersen, Ted" sort="Pedersen, Ted" uniqKey="Pedersen T" first="Ted" last="Pedersen">Ted Pedersen</name>
</noRegion>
<name sortKey="Kulkarni, Anagha" sort="Kulkarni, Anagha" uniqKey="Kulkarni A" first="Anagha" last="Kulkarni">Anagha Kulkarni</name>
<name sortKey="Solorio, Thamar" sort="Solorio, Thamar" uniqKey="Solorio T" first="Thamar" last="Solorio">Thamar Solorio</name>
</country>
<country name="Belgique"><noRegion><name sortKey="Angheluta, Roxana" sort="Angheluta, Roxana" uniqKey="Angheluta R" first="Roxana" last="Angheluta">Roxana Angheluta</name>
</noRegion>
</country>
<country name="Espagne"><noRegion><name sortKey="Kozareva, Zornitsa" sort="Kozareva, Zornitsa" uniqKey="Kozareva Z" first="Zornitsa" last="Kozareva">Zornitsa Kozareva</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Musique/explor/MozartV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001860 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001860 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Musique
   |area=    MozartV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:CA0D47ABADFF95734272C07D75D52761282B3098
   |texte=   An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features
}}

This area was generated with Dilib version V0.6.20.
Data generation: Sun Apr 10 15:06:14 2016. Site generation: Tue Feb 7 15:40:35 2023

	Serveur d'exploration sur Mozart
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur Mozart

An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features

An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri